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c3 \ Abstract 



Learning latent structure in complex networks has become an important problem 
fueled by many types of networked data originating from practically all fields 
of science. In this paper, we propose a new non-parametric Bayesian multiple- 
membership latent feature model for networks. Contrary to existing multiple- 
membership models that scale quadratically in the number of vertices the pro- 
posed model scales linearly in the number of links admitting multiple-membership 
analysis in large scale networks. We demonstrate a connection between the sin- 
gle membership relational model and multiple membership models and show on 
"real" size benchmark network data that accounting for multiple memberships im- 
£> ' proves the learning of latent structure as measured by link prediction while explic- 

C , itly accounting for multiple membership result in a more compact representation 

of the latent structure of networks. 
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1 Introduction 

The analysis of complex networks has become an important challenge spurred by the many types 
of networked data arising in practically all fields of science. These networks are very different in 
nature ranging from biology networks such as protein interaction lfl8l Q] and the connectome of 
neuronal connectivity |[T9l to the analysis of interaction between large groups of agents in social and 
technology networks lfT4l[T9l l9l l20l . Many of the networks exhibit a strong degree of structure; thus, 
learning this structure facilitates both the understanding of network dynamics, the identification of 
link density heterogeneities, as well as the prediction of "missing" links. 

We will represent a network as a graph Q = (V, y) where V = {vi, ■ . . , wjv} is the set of vertices 
and y is the set of observed links and non-links. Let Y <E {0, 1, ?} ArxJV denote a link (adjacency) 
matrix where the element = 1 if there is a link between vertex i>j and Vj, yij = if there is not a 
link, and j/y =? if the existence of a link is unobserved. Furthermore, let y\, and y? denote the 
set of links, non-links, and unobserved links in the graph respectively. 

Over the years, a multitude of methods for identifying latent structure in graphs have been proposed, 
most of which are based on grouping the vertices for the identification of homogeneous regions. 
Traditionally, this has been based on various community detection approaches where a community 
is defined as a densely connected subset of vertices that is sparsely linked to the remaining network 
ifTTl [171 . These structures have for instance been identified by splitting the graph using spectral 
approaches, analyzing flows, and through the analysis of the Hamiltonian. Modularity optimization 
|[T5l is a special case that measures the deviation of the fraction of links within communities from 
the expected fraction of such links based on their degree distribution |[T31[T71 . A drawback, however, 
for these types of analyses is that they are based on heuristics and do not correspond to an underlying 
generative process. 



1 



Y 



i'2 
''3 

^5 
«6 
V T 

Vg 



Figure 1 : Left: Example of a simple graph where each of the vertices have multiple memberships 
indicated by colors. Right: The corresponding assignment matrix. 



Probabilistic generative models: Recently, generative models for complex networks have been 
proposed where links are drawn according to conditionally independent Bernoulli densities, such 
that the probability of observing a link is given by -k^ ■, 

p(Y\U)= J] Tr^Cl-Try) 1 -™. (1) 

In the classical Erdos-Renyi random graph model, each link is included independently with equal 
probability 7Ty = ttq; however, more expressive models are needed in order to model complex latent 
structure of graphs. In the following, we focus on two related methods: latent class and latent feature 
models. 

Latent class models: In latent class models, such as the stochastic block model |[T6l . also denoted 
the relational model (RM), each vertex Vi belongs to a class Cj, and the probability, 7Ty-, of a link 
between u,; and vj is determined by the class assignments and Cj as 7Ty- = p CiCj . Here, pki € [0, 1] 
denotes the probability of generating a link between a vertex in class k and a vertex in class t. 
Inference in latent class models involves determining the class assignments as well as the class link 
probabilities. Based on this, communities can be found as (groups of) classes with high internal and 
low external link probability. 

In the model proposed by [7 1 (HW) the class link probability, pki, is specified by a within-class 
probability T) c and a between-class probability r\ n , i.e pkt — T] n (l — Ski) + r) c 5ki-- Another intuitive 
representation, which we refer to as DB, is to have a shared between-class probability but allow for 
individual within-class probabilities, i.e pki = f] n (l — 8u) + rjkiki- Both of these representations 
are consistent with the notion of communities with high internal and low external link density, and 
restricting the number of interaction parameters can facilitate model interpretation compared to the 
general RM. 

Based on the Dirichlet process, ||9] 120) propose a non-parametric generalization of the stochastic 
block model with a potentially infinite number of classes denoted the infinite relational model (IRM) 
and infinite hidden relational model respectively. The latter generalizing the stochastic block model 
to simultaneously model potential vertex attributes. Inference in IRM jointly determines the number 
of latent classes as well as class assignments and class link probabilities. This approach readily 
generalizes to the HW and DB parameterizations of p. 

Latent feature models: In latent feature models, the assumption that each vertex belongs to a 
single class is relaxed. Instead it is assumed that each vertex Vi has an associated feature Zi, and 
that probabilities of links are determined based on interactions between features. This generalizes 
the latent class models, which are the special case where the features are binary vectors with exactly 
one non-zero element. 

Many latent feature models support the notion of discrete classes, but allow for mixed or multiple 
memberships (see Figure[T]for an illustration of a network with multiple class memberships). In the 
mixed membership stochastic block model (MMSB) [1| the vertices are allowed to have fractional 
class memberships. In binary matrix factorization [11] multiple memberships are explicitly modeled 
such that each vertex can be assigned to multiple clusters by an infinite latent feature model based 
on the Indian buffet process (IBP) (6). |[T2l study this approach, for the specific case of a Bernoulli 
likelihood, Eq. ((TJ, and extend the method to include additional side information as covariates in 
modeling the link probabilities. In their model, the probability of a link 7Ty is specified by 7Tjj = 
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fa (Sfef z ikZjiWki + Sij), where / CT (-) is a function with range [0, 1] such as a sigmoid, and Wki 
are weights that affects the probability of generating a link between vertices in cluster k and I. The 
term Sy accounts for bias as well as additional side-information. For example, if covariates <p i are 
available for each vertex Vi, lfl2ll suggest including the term s,-j = /3d(</> i , 0^) + flj 4>i + flj 4>j> 
where /?, /3,-, and (3j are regression parameters, and d(-, ■) is some possibly nonlinear function. 

In general, the computational cost of the single membership clustering methods mentioned above 
scales linearly in the number of links in the graph. Unfortunately, existing multiple membership 
models lfTl [TTl[T2 ll scale quadratically in the number of vertices, because they require explicit compu- 
tations for all links and non-links. This renders existing multiple membership modeling approaches 
infeasible for large networks. Furthermore, determining the multiple membership assignments is a 
combinatorial challenge as the number of potential states grow as 2 KN rather than K N in single 
membership models. In particular, standard Gibbs sampling approaches tend to get stuck in local 
suboptimal configurations where single assignment changes are not adequate for the identification 
of probable alternative configurations ifTTl . Consequently, there is both a need for computationally 
efficient models that scale linearly in the number of links as well as reliable inference schemes for 
modeling multiple memberships. 

In this paper, we propose a new non-parametric Bayesian latent feature graph model, denoted the 
infinite multiple relational model (IMRM), that addresses the challenges mentioned above. Specifi- 
cally, the contributions in this paper are the following: i) We propose the IMRM in which inference 
scales linearly in the number of links, ii) We propose a non-conjugate split-merge sampling proce- 
dure for parameter inference, iii) We demonstrate how the single membership IRM model implicitly 
accounts for multiple memberships, iv) We compare existing non-parametric single membership 
models with our proposed multiple membership counterparts in learning latent structure of a variety 
of benchmark "real" size networks and demonstrate that explicitly modeling multiple-membership 
results in more compact representations of latent structure. 

2 Infinite multiple-membership relational model 

Given a graph, assume that each vertex Vi has an associated X-dimensional binary latent feature 
vector, Zi, with K{ — |z,-|i assignments. Consider vertex Vi and Vj\ For all KiKj combinations of 
classes there is an associated probability, p^e, of generating a link. We assume that each of these 
combinations of classes act independently to generate a link between i>, and Vj, such that the total 
probability, tt^, of generating a link between Vi and Vj is given by 

7T« = 1 - (1 - CTij) - P uY ikZil , (2) 

ke 

where cr^ is an optional term that can be used to account for noise or to include further side- 
information as discussed previously. Under this model, the features act as independent causes of 
links, and thus if a vertex gets an additional feature it will result in an increased probability of link- 
ing to other vertices. In contrast to the model proposed by [12], where negative weights leads to 
features that inhibit links, our model is more restricted. Although this might result in less power 
to explain data, we expect that it will be easier to interpret the features in our model because links 
are directly generated by individual features and not through complex interactions between features. 
This is analogous to non-negative matrix factorization that is known to form parts-based represen- 
tation because it does not allow component cancellations ifTOl . If the latent features Zi have only a 
single active element and <7y = 0, Eq. (f2j) reduces to 7T,j = p Ci c r i-e., the proposed model directly 
generalizes the IRM model; hence, we denote our model the infinite multiple-membership relational 
model (IMRM). 

The link probability model in Eq. (0 has a very attractive computational property. In many real 
data sets, the number of non-links far exceeds the number of links present in the network. To 
analyze large scale networks where this holds it is a great advantage to devise algorithms that scale 
computationally only with the number of links present. As we show in the following, our model 
has that property. Assuming a. L j = for simplicity of presentation, we may write Eq. (O more 

compactly as 7Ty = 1 — e z ' Tpz » , 
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where the elements of the matrix P are p k e — log(l — pu)- Inserting this in Eq. ([]]) we have 



p(Y\Z,P) = H(l-e*< p **y ii (>''*)' 1 = 1 J ( 1 



(i,j)ey (i,j)eyi (i,j)ey 



ex p \y^zj pz 3 



(3) 



The exponent of the second term, which entails a sum over the possibly large set of non-links in the 
network, can be efficiently computed as 



N N 



^2 z je -^2 z ikZjeh (4) 



u j=i j=i (i,j)eyiuy 7 

requiring only summation over links and "missing" links. Assuming that the graph is not dominated 
by "missing" links, the computation of Eq. (0 scales linearly in the number of graph links, ^Vil- 
We presently consider latent binary features z,-, but we note that the model scales linearly for any 

parameterizations of the latent feature vector Zi, as long as 7Tjj = 1 — e z * Pzj E [0; 1] which holds 
in general if Zi is non-negative. 

As in existing multiple membership models ITT1 [T2l we will assume an unbounded number of la- 
tent features. We learn the effective number of features through the Indian buffet process (IBP) 
representation [6|, which defines a distribution over unbounded binary matrices, 

he[o,i] N 

where m k is the number of vertices belonging to class k and Kh is the number of columns of Z 
equal to h. 

As a prior over the class link probabilities we choose independent Beta distributions, 

Pki | a-ki , b ki ~ Beta(a k £ ,b k i)(x p^f ~ 1 (1 - Pkt) hkl ~ 1 - (6) 

This is a conjugate prior for the single membership models where the parameters a k i and b k i corre- 
spond to pseudo counts of links and non-links respectively between classes k and I. 

2.1 Inference 

In the following we present a method for inferring the parameters of the model: the infinite binary 
feature matrix Z and the link probabilities p k i- In the latent class model when only a single feature is 
active for each vertex, the likelihood in Eq. (01 is conjugate to the Beta prior for p k e- In that case, P 
can be integrated away and a collapsed Gibbs sampling procedure for Z can be used |9|. This is not 
possible in the IMRM; instead, we propose to sample P ~ p(P\Z, Y) using Hamiltonian Markov 
chain Monte Carlo (HMC), and Z ~ p(Z\P 1 Y) using Gibbs sampling combined with split-merge 
moves. 

HMC for class link probabilities: Hamiltonian Markov chain Monte Carlo (HMC) |5| is an auxil- 
iary variable sampling procedure that utilizes the gradient of the log posterior to avoid the random 
walk behavior of other sampling methods such as Metropolis-Hastings. In the following we do not 
describe the details of the HMC algorithm, but only derive the required expressions for the gradi- 
ent. To utilize HMC, the sampled variables must be unconstrained, but since p k e is a probability we 
make the following change of variable from p ke e [0, 1] to r ke E (-oo, oo), p M = 1+cxp 1 ( _ T . fc ^ , 

r k ( = — log (p^ 1 — l) . Using the change of variables theorem, the prior for the class link probabil- 
ities expressed in terms of r k g is given by p(r k e\a k i,b k e) oc e akil ~ kt (e Tke + l)-( a '=f+ h fc«). With this, 
the relevant terms of the negative log posterior is given by 

-C P = \ogp{P\Z,Y) = c+^og (l - e^^+Y.zJPzj^aurke+iaktib^logie^+l), 

(i,j)eyi (ij)eyo ke. 

(7) 

where c does not depend on P. From this, the required gradient can be computed, 

X ZikZjipki + ^2 ZikZjtPki + (a M + bktjpkt - au- (8) 



drki ^ 1 — e z i 

(i,j)eyi (i,j)ey Q 
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Again, the possibly large sum over non-links in the second term can be computed efficiently as in 
Eq. @. 

Gibbs sampler for binary features: Following |6|, a Gibbs sampler for the latent binary features 
Z can be derived. Consider sampling the fcth feature of vertex vf. If one or more other vertices also 
possess the feature, i.e., m_;fc = Y^jjti z jk > 0' trie posterior marginal is given by 

m-jk 
N 



P (z lk = l\Z_ (ik) ,P,Y) «p(y|Z,P)-^. (9) 



When evaluating the likelihood term, only the terms that depend on Zik need be computed and the 
Gibbs sampler can be implemented efficiently by reusing computation and by up and down dating 
variables. 

In addition to sampling existing features, k[ 1 ' — Poisson(-^-) new features should also be associated 

. — . u\ 
with Vi. [ 6 ] suggest , . . computing probabilities for a range of values of K\ up to some reasonable 

upper bound. . . "; however, following ifTTl we take another approach and sample the new features 

by Metropolis-Hastings using the prior as proposal density. The values of phi corresponding to the 

new features are proposed from the prior in Eq. ©. 

Split-merge move for binary features: A drawback of Gibbs sampling procedures is that only 
a single variable is updated at a time, which makes the sampler prone to get stuck in suboptimal 
configurations. As a remedy, bolder Metropolis Hasting moves can be considered in which multi- 
ple changes of assignments help exploring alternative high probability configurations. How these 
alternative configurations are proposed is crucial in order to attain reasonable acceptance rates. A 
popular approach is to split or merge existing classes as proposed in [8 | for the Dirichlet process 
mixture model (DPMM). Split-merge sampling in the IBP has previously been discussed briefly by 

IfTTl and ma. 



Inspired by the non-conjugate sequential allocation split-merge sampler for the DPMM [4], we pro- 
pose the following procedure: Draw two non-zero elements of Z, (k%, i\) and (k2, 12)- If &i = 
propose a split — otherwise propose to merge classes k\ and fc 2 into a joint cluster k\. Accept the 

proposal with the Metropolis-Hastings acceptance rate, a* = min ( 1, P ^zp\Y)l(^*\z)q'{P*\p) " ) ■ 
In case of a merge, we remove fc 2 and assign all its vertices to k\, and we remove the correspond- 
ing row and column of P (this proposal is deterministic and has probability one). For a split, we 
remove all vertices except i\ from cluster k% = k^ = k and create a new cluster k* and assign %i 
to it. We then sample a new row and column p* k , t , for the new cluster as described below. Next we 
sequentially allocate [4 1 the remaining original members of k to either k or k* or both in a restricted 
Gibbs sampling sweep, and refine the allocation through t additional restricted Gibbs scans fUJ. 

The proposal density for pt,*, is based on a random walk, p k , t ~ Beta(afc/^/, bk'i 1 ), where 

bk't' = max (l, (1 — p~k'e)ml — 1 + pk>t>), a,k>t> = max ( 1, k f — bkw), (10) 

V 1 - pk't> ' 

such that p* k , t , has mean p k 'i' and variance equal to the empirical variance, p^y (1 — pk't) / m 1- We 

r pa i\ /-•' 

choose the mean of the random walk as put' = \ 1 W — 1. o> — u* sucn that the 

I ITT L^t Ptt fc — fc 7^ — fc 
[ Pki otherwise, 
new class has a similar within and between class link probabilities as the original class, but such that 
the class link probability between the original and new cluster is similar to the remaining between 
class link probabilities. This choice is crucial, since it favors splitting classes into two classes that 
are no more related than the relation to the remaining classes. 



3 Results 



Based on the HW, DB, and RM parametrization of p, we compared our proposed IMRM to the corre- 
sponding single-membership IRM [9j. We evaluated the models on a range of synthetically gener- 
ated as well as real world networks. We assessed model performance in terms of ability to predict 
held-out links and non-links. As performance measure we used the area under curve (AUC) of the 
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Figure 2: IRM (upper) and IMRM (lower) analysis of single (left) and multiple membership HW 
network (right). On the single membership data, both models find the correct class assignments. On 
the multiple membership data, the IMRM finds the correct 10 classes, while IRM extracts 25 classes, 
which through p accounts for all combinations of classes present in the data. 
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Figure 3: AUC scores for the analysis of the six synthetically generated data sets. 



receiver operating characteristic (ROC). We also computed the predictive log likelihood (not shown 
here) which gave similar results. For comparison, we included the performance of several standard 
non-parametric link prediction approaches based on the following scores, 

ComN T DcgPr , , Jacc _ ViVj ShP _ 1 

where fc, = is the degree of vertex Uj. 

In all the analyses we removed 2.5% of the links and an equivalent number of non-links for cross- 
validation. We analyzed a total of five random data splits and all of the analyses were based on 
2500 sampling iterations initialized randomly with K = 50 clusters. Each iteration was based on 
split-merge sampling using sequential allocation with t = 2 restricted Gibbs scans followed by 
standard Gibbs sampling. Our implementation of the IRM was based on collapsed Gibbs sampling 
(i.e. integrating out p) as proposed in [9| but we also included a conjugate single-membership 
split-merge step corresponding to the proposed non-conjugate split-merge sampler. The priors were 
chosen as a = log(iV), atk = 5, = lVfc ^ I and bkk = 1, bkt = 5Vfc ^ £ which renders the 
priors practically non-informative. 

Synthetic networks: We analyzed a total of six synthetic networks generated according to the HW, 
DB and RM models based on the vertices having either one or two memberships to the underlying 
classes. For the single membership models we generated a total of K = 5 groups each containing 
100 vertices. For the HW generated network we set p c — 1 and po — while for the DB generated 
network we used a within community densities pk ranging from 0.2 to 1 while po = 0. The RM 
generated network had same within community densities as the DB network but included varying 
degrees of overlap between the communities. The multiple membership models denoted MHW, MDB 
and MRM were generated from the corresponding single membership models as Y V RYR T (where 
V denotes element-wise or and R is a random permutation matrix with diagonal zero), such that each 
vertex belongs to two classes. 

While the IMRM model explicitly accounts for multiple memberships, the IRM model can also im- 
plicitly account for multiple memberships through the between class interactions. To illustrate this, 
we analyzed the generated HW and MHW data by the IRM model as well as the proposed IMRM 
model (see Figure |2). When there are only single memberships, the IMRM reduces to the IRM 
model; however, when the network is generated such that the vertices have multiple memberships 
the IMRM model correctly identifies the (2 • 5 = 10) underlying classes. The IRM model on the 
other hand extracts a larger number of classes corresponding to all possible (5 2 = 25) combinations 
of classes present in the data. The estimated p indicates how these 25 classes combine to form the 



6 



Yeast 



USPower 



Erdos 



FreeAssoc 



Reuters911 



1 






0.8 






0.6 







0.4 







00* 





9 


.... 


e 






• • a * 













^SgiS5a°2 l^ggfagEai iaggfagSBS 



g 9 b ffi 9 « 



Figure 4: AUC scores for the analysis of the five real networks. 



Table 1: Summary of the analyzed real networks: r denotes the networks assortativity, c the cluster- 
ing coefficient [ 19 1, L the average shortest path. 



Network 


N 




r 


c 


L 


Description 


Yeast 


2,284 


6,646 


-0.10 


0.13 


4.4 


Protein-protein interaction network 18 


USPower 


4,941 


6,594 


0.00 


0.08 


19.9 


Topology of power grid [191 


Erdos 


5,534 


8,472 


-0.04 


0.08 


3.9 


Erdos 02 collaboration network f2l 


FreeAssoc 


10,299 


61,677 


-0.07 


0.12 


3.9 


Word relations in free association 1131 


Reuters 11 


13,314 


148,038 


-0.11 


0.37 


3.1 


Word co-occurence [3l 



10 underlying multiple membership groups in the network. As such, the IRM model has the same 
expressive power as the proposed multiple membership models but interpreting the results can be 
difficult when multiple membership community structure is split into several classes with complex 
patterns of interaction. 

Figure [3] shows the link-prediction AUC scores from the analysis of the six generated networks. 
Results show that all models work well on data generated according to their own model or models 
which they generalize. We also notice, that the IRM model accounts well for multiple membership 
structure as discussed and illustrated in Figure [2] The HW and DB models on the other hand fail in 
modeling networks with multiple memberships. 

Real Networks: We finally analyzed five benchmark complex networks summarized in Table Q] 
The sizes of most of the networks makes it computationally infeasible for us to analyze them using 
the existing multiple-membership approaches proposed in !T] [TTl[T2ll . For all the networks, multiple 
memberships are conceivable: In protein interaction networks such as the Yeast network proteins 
can be part of multiple functional groups, in social networks such as Erdos scientist collaborate 
with different groups of people depending on the research topic, and in word relation networks such 
as Reuters911 and FreeAssoc words can have multiple meanings/contexts. For all these networks 
explicitly modeling these multiple contexts can potentially improve on the structure identification 
over the equivalent single membership models. 

In Figure |4] the AUC link prediction score is given for the five networks analyzed. As can be seen 
from the results, modeling multiple memberships significantly improves on predicting links in the 
network. In particular when considering the IHW and IDB models and the corresponding proposed 
multiple membership models, the learning of structure is improved substantially for all networks 
except USPower. Furthermore, it can be seen that the IRM model that can also implicitly account 
for multiple memberships in general has a similar performance to the multiple membership models. 
The poor identification of structure in the USPower network might be due to the fact that the average 



Table 2: Top table: The number of extracted components for the IRM and IMRM models. Bold 
denotes that the number of components are significantly different between the two models (i.e. 
difference in mean is at least two standard deviations apart). Bottom table: cpu-time usage in hours 
for 2500 IRM and IMRM sampling iterations. 





Yeast 


USPower 


Erdos 


FreeAssoc 


Reuters911 


IRM 


24.0 ± 0.8 


8.6 ± 0.4 


10.4 ± 0.3 


58.6 ± 0.7 


39.8 ± 2.1 


IMRM 


15.4 ± 0.9 


6.8 ± 0.5 


6.8 ± 0.6 


15.6 ± 0.9 


44.8 ± 1.0 




Yeast 


USPower 


Erdos 


FreeAssoc 


Reuters911 




IRM 


2.3 ± 0.1 


4.0 ± 0.2 


14.6 ± 5.9 


30.1 ± 0.6 


32.5 ± 5.4 


IMRM 


1.7 ± 0.1 


8.9 ± 0.8 


7.1 ± 0.5 


28.1 ± 1.9 


71.5 ± 3.2 
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path between vertices are very high rendering it difficult to detect the underlying structure for any 
but the most simple IHW model. While the IRM and IMRM perform equally well in terms of link 
prediction it can be seen in table |2] that the average number of extracted components for the IMRM 
model is significantly smaller than the number of components extracted by the IRM model for all 
networks except the Reuters911 network where no significant difference is found. As a result, the 
IMRM model is in general able to extract a more compact representation of the latent structure of 
networks. In table [2] is given the total cpu-time for estimating the 2500 samples for each of the 
network using the IRM and IMRM showing that the order of magnitude for the computational cost of 
the two models are the same. 

4 Discussion 

While single membership models based on the IRM indirectly can account for multiple memberships 
as we have shown, the benefit of the proposed framework is that it allows for these multiple mem- 
berships to be modeled explicitly rather than through complex between-group interactions based on 
a multitude of single membership components. On synthetic and real data we demonstrated that 
explicitly modeling multiple-membership resulted in a more compact representation of the inherent 
structure in networks. We further demonstrated that models that can capture multiple memberships 
(which includes the IRM model) significantly improve on the link prediction relative to models that 
can only account for single membership structure, i.e., the IHW and IDB models. We presently con- 
sidered undirected networks but we note that the proposed approach readily generalizes to directed 
and bi-partite graphs. Furthermore, the approach also extends to include side information as pro- 
posed in lfl2ll as well as simultaneous modeling of vertex attributes ll20l . We note however, that the 
inclusion of side information requires a linearly scalable parameterization in order for the overall 
model to remain computationally efficient. An attractive property of the IRM model over the IMRM 
model is that the IRM model admits the use of collapsed Gibbs sampling which we have found to be 
more efficient relative to sampling the non-conjugate multiple membership models where additional 
sampling of the p parameter is required. In future research, we envision combining the IRM and 
IMRM model, using the IRM as initialization for the IMRM or by forming hybrid models. 
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